Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

CHAPTER 23 Survival Regression 335

per day has that 1.05 multiplication applied 20 times, which is like multiplying by

1 05²⁰

, which equals 2.65. One pack contains 20 cigarettes, so if you change the

units in which you record smoking levels from cigarettes per day to packs per day,

you would use units that are 20 times larger. In that case, the corresponding

regression coefficient is 20 times larger, and the HR is raised to the 20th power

(2.65 instead of 1.05 in this example).

And a two-pack-per-day smoker’s hazard increases by a factor of 2.65 over a

one-pack-per-day smoker. This translates to a 2 65²

increase (approximately

sevenfold) in the chances of dying at any instant for the smoker compared to a

nonsmoker.

Executing a Survival Regression

As with all statistical methods dealing with time-to-event data, your dependent

variable is actually a pair of variables:»

» ^{Event status:}^{The event status variable is coded this way:}

• ^{Equal to 1 if the event was known to occur during the observation period}

(uncensored)

• ^{Equal to 0 if the event didn’t occur during the observation period}^(censored)»

» ^{Time-to-event:}^{In participants who experienced the event during the}

observation period, this is the time from the start of observation to the

occurrence of the event. In participants who did not experience the event

during the observation period, this is the time from the start of observation to

the last time the participant was observed. We describe time-to-event data in

more detail in Chapter 21.

And as with all regression methods, you designate one or more variables as the

predictors. The rules for representing the predictor variables are the same as

described in Chapter 18:»

» ^{For continuous numerical variables, choose units of a convenient magnitude.»}

» ^{For categorical predictors, carefully consider how you recode the data,}

especially in terms of selecting a reference group. Consider a five-level age

group variable. Would you want to model it as an ordinal categorical variable,

assuming a linear relationship with the outcome? Or would you prefer using

indicator variables, allowing each level to have its own slope relative to the

reference level? Flip to Chapter 8 for more on recoding categorical variables.